Feasibility Study for Ellipsis Resultion in Dialogues by Machine-Learning Technique

نویسندگان

  • Kazuhide Yamamoto
  • Eiichiro Sumita
چکیده

A method for resolving the ellipses that appear in Japanese dialogues is proposed. This method resolves not only the subject ellipsis, but also those in object and other grammatical cases. In this approach, a machine-learning algorithm is used to select the attributes necessary for a resolution. A decision tree is built, and used as the actual ellipsis resolver. The results of blind tests have shown that the proposed method was able to provide a resolution accuracy of 91.7% for indirect objects, and 78.7% for subjects with a verb predicate. By investigating the decision tree we found that topic-dependent attributes are necessary to obtain high performance resolution, and that indispensable attributes vary according to the grammatical case. The problem of data size relative to decision-tree training is also discussed. 1 I n t r o d u c t i o n In machine translation systems, it is necessary to resolve ellipses when the source language doesn't express the subject or other grammatical cases and the target must express it. The problem of ellipsis resolution is also troublesome in information extraction and other natural language processing fields. Several approaches have been proposed to resolve ellipses, which consist of endophoric (intrasentential or anaphoric) ellipses and exophoric (or extrasentential) ellipses. One of the major approaches for endophoric ellipsis in theoretical basis utilizes the centering theory. However, its application to complex sentences has not been established because most studies have only investigated its effectiveness with successive simple sentences. Several studies of this problem have been made using the empirical approach. Among them, Murata and Nagao (1997) proposed a scoring approach where each constraint is manually scored with a n estimation of possibility, and the resolution is conducted by totaling the points each candidate receives. On the other hand, Nakaiwa and Shirai (1996) proposed a resolving algorithm for Japanese exophoric ellipses of written texts, utilizing semantic and pragmatic constraints. They claimed that 100% of the ellipses with exophoric referents could be resolved, but the experiment was a closed test with only a few samples. These approaches always require some effort to decide the scoring or the preference of provided constraints. Aone and Bennett (1995) applied a machinelearning technique to anaphora resolution in written texts. They attempted endophoric ellipsis resolution as a part of anaphora resolution, with approximately 40% recall and 74~ precision at best from 200 test samples. However, they were not concerned with exophoric ellipsis. In contrast, we applied a machine-learning approach to ellipsis resolution (Yamamoto et al., 1997). In this previous work we resolved the agent case ellipses in dialogue, with a limited topic, and performed with approximately 90% accuracy. This does not sufficiently determine the effectiveness of the decision tree, and the feasibility of this technique in resolving ellipses by each surface case is also unclear. We propose a method to resolve the ellipses that appear in Japanese dialogues. This method resolves not only the subject ellipsis, but also the object and other grammatical cases. In this approach, a machine-learning algorithm is used to build a decision tree by selecting the necessary attributes, and the decision tree is used as the actual ellipsis resoh'er. Another purpose of this paper is to discuss how effective the machine-learning approach is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feasibility Study for Ellipsis Resolution in Dialogues by Machine-Learning Technique YAMAMOTO Kazuhide and SUMITA

A method for resolving the ellipses that appear in Japanese dialogues is proposed. This method resolves not only the subject ellipsis, but also those in object and other grammatical cases. In this approach, a machine-learning algorithm is used to select the attributes necessary for a resolution. A decision tree is built, and used as the actual ellipsis resolver. The results of blind tests have ...

متن کامل

Automatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique

The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...

متن کامل

A corpus-based study of Verb Phrase Ellipsis

Although considerable work exists on the subject of ellipsis resolution, there has been very little empirical, corpus-based work on it. We propose a system which will take free text and (i) detect instances of Verb Phrase ellipsis, (ii) identify their antecedents and (iii) resolve them, providing an end-to-end solution. For each of the steps, manually developed methods and machine learning tech...

متن کامل

A Contrastive Study of Persian and English Written Discourse: Ellipsis in Realistic Novels

  This study aspires to examine the concept of ellipsis by comparing and contrasting English and Persian written texts. For this purpose, three Persian novels and three English ones were selected. These novels were analyzed carefully; they were compared and contrasted for types and amount of ellipsis used, through a Chi-square analysis.  The results of the data analysis revealed that various t...

متن کامل

A theme structure method for the ellipsis resolution

The purpose of this paper is to solve the contextual ellipsis problem that is popular in our Chinese spoken dialogue system named EasyNav. A Theme Structure is proposed to describe the attentional state. Its dynamic generation feature makes it suitable to model the topic transition in user-initiative dialogues. By studying the differences and the similarities between the ellipsis and the anapho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998